Abstract: Hadoop provides distributed file system and a framework for the analysis and transformations of very large data sets using MapReduce. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size .Over the last few years; organizations across public and private sectors have made a strategic decision to turn big data into competitive advantage. The challenge of extracting value from big data is similar in many ways to the age-old problem of distilling business intelligence from transactional data. At the heart of this challenge is the process used to extract data from multiple sources, transform it to fit your analytical needs, and load it into a data ware house for subsequent analysis, a process known as “Extract, Transform Load”. The nature of big data requires that the infrastructure for this process cost-effectively. Apache Hadoop has emerged as the de facto standard for managing big data. In past few years Hadoop Distributed File System (hdfs) has been used by many organizations with gigantic data sets and streams of operations on it. HDFS provides distinct features like, high fault tolerance, scalability; etc. The Name node machine is a single point of failure (SPOF) for a HDFS cluster. If the name node machine fails the system needs to be re-starting manually the system less available.
Keywords: Big data, Hadoop, HDFS.